different data source
Clip_Dataset__NeurIPS2022_ (10)
We visualize samples of the class "broom" from the reference We find that the data efficiency (i.e., how fast the error The two models' logit predictions are ensembled with equal weights Output mixing results for two CLIP models trained on YFCC-3M + CC-3M mixture and RedCaps-3M respectively . Ensemble outputs of CLIPs trained on different data sources and dataset sizes (red and orange lines), taken from the same stage of training (i.e., epoch), lie on the linear trend of training a We provide proofs of main theoretical claims in Section 6. F .1 Proof of Theorem 1
Federated Learning for Cross-Domain Data Privacy: A Distributed Approach to Secure Collaboration
Zhang, Yiwei, Liu, Jie, Wang, Jiawei, Dai, Lu, Guo, Fan, Cai, Guohui
This paper proposes a data privacy protection framework based on federated learning, which aims to realize effective cross-domain data collaboration under the premise of ensuring data privacy through distributed learning. Federated learning greatly reduces the risk of privacy breaches by training the model locally on each client and sharing only model parameters rather than raw data. The experiment verifies the high efficiency and privacy protection ability of federated learning under different data sources through the simulation of medical, financial, and user data. The results show that federated learning can not only maintain high model performance in a multi-domain data environment but also ensure effective protection of data privacy. The research in this paper provides a new technical path for cross-domain data collaboration and promotes the application of large-scale data analysis and machine learning while protecting privacy.
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis
Petrov, Nikolay B, Serapio-García, Gregory, Rentfrow, Jason
The humanlike responses of large language models (LLMs) have prompted social scientists to investigate whether LLMs can be used to simulate human participants in experiments, opinion polls and surveys. Of central interest in this line of research has been mapping out the psychological profiles of LLMs by prompting them to respond to standardized questionnaires. The conflicting findings of this research are unsurprising given that mapping out underlying, or latent, traits from LLMs' text responses to questionnaires is no easy task. To address this, we use psychometrics, the science of psychological measurement. In this study, we prompt OpenAI's flagship models, GPT-3.5 and GPT-4, to assume different personas and respond to a range of standardized measures of personality constructs. We used two kinds of persona descriptions: either generic (four or five random person descriptions) or specific (mostly demographics of actual humans from a large-scale human dataset). We found that the responses from GPT-4, but not GPT-3.5, using generic persona descriptions show promising, albeit not perfect, psychometric properties, similar to human norms, but the data from both LLMs when using specific demographic profiles, show poor psychometrics properties. We conclude that, currently, when LLMs are asked to simulate silicon personas, their responses are poor signals of potentially underlying latent traits. Thus, our work casts doubt on LLMs' ability to simulate individual-level human behaviour across multiple-choice question answering tasks.
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
Unsupervised Change Point Detection for heterogeneous sensor signals
Abstract--Change point detection is a crucial aspect of analyzing strategies it is necessary to identify momentum turning points, when time series data, as the presence of a change point indicates an a trend reverses from an uptrend to a downtrend such as in the 2020 abrupt and significant change in the process generating the data. While many algorithms for the problem of change point detection have been developed over time, it can be challenging to select This article presents an overview and comparison of algorithms the appropriate algorithm for a specific problem. The choice of commonly used for detecting change points in time series data. The the algorithm heavily depends on the nature of the problem and focus is on unsupervised change point detection, which involves the underlying data source. In this paper, we will exclusively segmenting the data without relying on large amounts of annotated examine unsupervised techniques due to their flexibility in the training data or the need to re-calibrate the model for each data application to various data sources without the requirement for source. The goal of this article is to help choosing the right detection abundant annotated training data and the re-calibration of the method for a particular application, with an emphasis on practical model. The examined methods will be introduced and evaluated aspects like the implementation and the calibration of the parameters. Our selection of methods aims for a good general performance for different data sources without fine tuning the algorithm.
Mapping Climate Change Research via Open Repositories & AI: advantages and limitations for an evidence-based R&D policy-making
Bovenzi, Nicandro, Duran-Silva, Nicolau, Massucci, Francesco Alessandro, Multari, Francesco, Parra-Rojas, César, Pujol-Llatse, Josep
In the last few years, several initiatives have been starting to offer access to research outputs data and metadata in an open fashion. The platforms developed by those initiatives are opening up scientific production to the wider public and they can be an invaluable asset for evidence-based policy-making in Science, Technology and Innovation (STI). These resources can indeed facilitate knowledge discovery and help identify available R&D assets and relevant actors within specific research niches of interest. Ideally, to gain a comprehensive view of entire STI ecosystems, the information provided by each of these resources should be combined and analysed accordingly. To ensure so, at least a certain degree of interoperability should be guaranteed across data sources, so that data could be better aggregated and complemented and that evidence provided towards policy-making is more complete and reliable. Here, we study whether this is the case for the case of mapping Climate Action research in the whole Denmark STI ecosystem, by using 4 popular open access STI data sources, namely OpenAire, Open Alex, CORDIS and Kohesio.
- Europe > Denmark > Capital Region > Copenhagen (0.15)
- North America > United States (0.14)
- Europe > Denmark > North Jutland > Aalborg (0.05)
- (3 more...)
- Health & Medicine (1.00)
- Government (1.00)
- Energy > Renewable (0.93)
AI for business users: a glossary
When you work with IT staff and data scientists, they're going to use acronyms that you might not be familiar with. It's important to know some of the basic terms and acronyms so you can communicate. Business users should make themselves familiar with these common AI terms to communicate well with the data teams. Artificial intelligence is a form of intelligence demonstrated by a computer. A computer can be programmed with logic and business rules that will enable it to "reason" through situations and come up with a conclusion.
Flenner
Integrating information from many different data sources to provide better situational awareness is an essential Navy issue. Many data fusion models use statistical methods to reduce statistical errors. Machine learning and big data provide, on the other hand, provides a unique framework for information fusion through our ability to learn what added benefits a different modality can provide. In this work, we provide a novel data fusion method that integrates relational data, provided to us in the form of a graph, and image data. We build an energy model that learns a representation of the data where different data sources are assumed to be similar using a graphical model. The energy model is a non-convex function which we optimize using stochastic gradient descent with momentum. The effectiveness of the model is demonstrated in an automated target recognition example.
Knowledge Management of the Future: Linking Humans and AI
Knowledge remains a critical component when it comes to standing out from the competition. Collaboration between people and machines offers businesses massive potential for their future knowledge management. However, technologies today are making traditional notions of knowledge management obsolete. Organizations need to rethink their handling of data and processes and encourage human-machine interaction to benefit from these changes. In the digital age, companies generate and store a veritable flood of data.
Multi-stream Data Analytics for Enhanced Performance Prediction in Fantasy Football
Bonello, Nicholas, Beel, Joeran, Lawless, Seamus, Debattista, Jeremy
Fantasy Premier League (FPL) performance predictors tend to base their algorithms purely on historical statistical data. The main problems with this approach is that external factors such as injuries, managerial decisions and other tournament match statistics can never be factored into the final predictions. In this paper, we present a new method for predicting future player performances by automatically incorporating human feedback into our model. Through statistical data analysis such as previous performances, upcoming fixture difficulty ratings, betting market analysis, opinions of the general-public and experts alike via social media and web articles, we can improve our understanding of who is likely to perform well in upcoming matches. When tested on the English Premier League 2018/19 season, the model outperformed regular statistical predictors by over 300 points, an average of 11 points per week, ranking within the top 0.5% of players rank 30,000 out of over 6.5 million players.